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Abstract 

In many applications such as data compression, imaging or genomic data 
analysis, it is important to approximate a given tensor by a tensor that is 
sparsely representable. For matrices, i.e. 2-tensors, such a representation can 
be obtained via the singular value decomposition which allows to compute the 
best rank k approximations. For i-tensors with t > 2 many generalizations of 
the singular value decomposition have been proposed to obtain low tensor rank 
decompositions. In this paper we will present a different approach which is 
based on best subspace approximations, which present an alternative general- 
ization of the singular value decomposition to tensors. 
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1 Introduction 

In this paper we will consider data sparse approximations of tensors. We will discuss 
a generalization of the singular value decomposition from matrices to tensors that 
is an alternative to the Tucker decomposition [8, 10]. In order not to overload the 
paper with technical we will mainly discuss 3-tensors, but our approach will work 
for arbitrary tensors. 

Let F be either the field of real numbers M or complex numbers C. Denote 
]pmix...xmd ._ (gjrf^^Fm^ the tcnsOT products of F'"i,...,F'"'^. T = G 
]pmix...xmd jg called a d-tensor in the given tensor product. Note that the number 
of coordinates of T is = mi . . . rrid- A tensor T is called a sparsely representable 
tensor if it can represented with a number of coordinates that is much smaller than 
N. 

The best known example of a sparsely representable 2-tensor is a low rank ap- 
proximation of a matrix A G ip'TMxm2_ ^ rank k approximation of A is given by 



^appr := X^i=i UiV^, which can be identified with J2i=i ® '^i- store A 



appr 
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need only the 2k vectors ui, . . . , G F"^i , vi, . . . , G F™^. The best rank k ap- 
proximation of ^ G ]pmixm2 pg^j^ |-,g computed via the singular value decomposition, 
abbreviated here as SVD, [4]. 

The computation of the SVD requires 0{m\m^ + m|) operations and at least 
0{rn\m2) storage. Thus, if the dimensions mi and m2 are very large, then the 
computation of the SVD is often infeasible. In this case other type of low rank 
approximations are considered, see e.g. [2, 3, 5]. 

For d-tensors with d > 2, however the situation is rather unsatisfactory. It is 
a major theoretical and computational problem to formulate good generalizations 
of low rank approximation for tensors and to give efficient algorithms to compute 
these approximations, see e.g. [8, 9, 10]. It is the goal of this paper to present and 
analyze an alternative generalization of the SVD to tensors. 

A tensor T = [Uj^k] G F"^i^"*2xm3 jg caUed a rank 1 tensor, and denoted by 
T = u(8)v(8)w, if tij^k = UiVjWk, where u = (ui, . . . ,^^1)"^, v = {vi, . . . ,Vm^)'^ ,w = 
{wi, . . . ,Wm3)'^ ■ A tensor T G ]p"iixm2xm3 g^ij to have rank k if T can be 
represented as a sum of k rank 1 tensors, and cannot be represented as a sum of 
k — 1 rank 1 tensors. Note that if T is a sum of k rank 1 tensors, then T can be 
represented with at most 0{k{i + m + n)) storage. 

We denote by TZ{k;mi,m2,mz) the set of tensors in ]p™ix™'2xm3 rank k at 
most. It is easy to show that 7^(1; mi, m2, 7713) is a closed set, more precisely an 
algebraic variety, in ^"^^^"^^xma jjowever, it is well known, see e.g. [1], that for 
some values of > 2, 7^(/c; mi, ?Ti2, 7773) is not a closed set. {TZ(k\mi,m2,mz) is 
called a quasi- algebraic variety.) 

Let II • II be a norm on F"^i^"^2xm3 -pj^g^ for A; > 2 it is possible that the 
minimization problem 

min 11^ --^11 (1-1) 

X&'R.{k;m\ ,m2 ,m3 ) 

does not have a minimal solution. This will happen if T has rank greater than k and 
T lies in the closure of 7^(A;; mi, m2, ms). Hence, any algorithm which tries to find 
a solution to the minimization problem (1.1) will fail for certain tensors T. Since 
7l{k; mi, m2, m3) is a closed set, for k = 1, i.e. for the best approximation by a rank 

I tensor, (1.1) will always have a minimal solution. 

The object of this paper to introduce a new family of sparsely represcntable 
approximations to tensors, which we call best subspace tensor approximation (BSTA ) 
of a given tensor T. As for the best rank 1 approximation, we will show that the 
BSTA always exists. Due to this fact, we think that in the case that the norm 

II • II on ]jr™ix™2xm3 ^j-^^, norm induced by the inner products on the vector spaces 
prni^ jrm2^ pma^ ^j^^ ggrj.^ jg appropriate generalization of the SVD, see [8] 
for other generalizations of the SVD for tensors. Similar approach was suggested 
recently by Khoromskij [7]. We will also present a numerical algorithm to compute 
the best subspace tensor approximation that is based on the computation of singular 
value decompositions for matrices. 

Unfortunately this numerical algorithm is extremely expensive. In order to re- 
duce the complexity, in the last section we consider a procedure that is based on 
the recently suggested fast SVD [3]. 
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2 Notation and preliminary results 



We denote by a bold capital letter a finite dimensional vector space U over the field 
F. A vector u G U is denoted by a bold face lower case letter. A matrix A G F"*!^"*^ 
denoted by a capital letter A, and we let either A = [aij]^^^'^ or simply A = [ajj]. 
A 3-tensor array T G ]pmixm2xm3 ^jj^ denoted by a capital calligraphic letter. 
So either T = [tij,k\i^j=k=i' simply T = [tij^k]- For a positive integer n we also 
use the convenient notation (n) := {1,2,..., n}. 

Let Ui,U2,U3 be three vectors spaces over F with rrij := dimUj, j = 1,2,3 
and let Uij, . . . , Um^.j be a basis of JJj for j = 1, 2, 3. Then U := Ui <^ U2 <^ U3 is 
the tensor product of Ui, U2, and U3; U is a vector space of dimension mim2m^, 
and 

Uii,i (2> Ui2,2 O Ui3,3, = l,...,mj, j = 1,2,3, (2.1) 

is a basis of U. 

A 3-tensor r is a vector in U and it has a representation 

mi,m2,m3 

^= tii,i2,i3Un,i®Ui2,2®Ui3,3, (2.2) 

il=«2=«3=l 

in the basis (2.1). If the basis (2.1) is fixed then r is identified with T = [^11,12,13] G 

]pmiXm2Xm3 

Recall that xi(8)X2(8>X3, were Xj G Ui, i = 1,2, 3, is called a rank 1 tensor. (Usually 

one assumes that all Xj 7^ 0. Otherwise = xi (81 X2 (8" X3 is called a rank tensor.) 
Then (2.2) is a decomposition of r as a sum of at most mim2m^ rank 1 tensors, 
as tii,i2,i3Uii,i ® Ui2,2 ® Ui3,3 = (iii,i2,i3Uii,i) ® Ui2,2 <^ Ui3,3. A decomposition of 
r G U\{0} as a sum of rank 1 tensors is given by 

k 

r = ^Xj (g)yj (g) Zj, Xj G Ui, Yi G U2, Zj G U3, i = 1,... (2.3) 

i=l 

The minimal k for which the above equality holds is called the rank of the tensor 
r. This definition is completely analogous to the definition of the rank for a matrix 
-4 = [an,i2\ G F™i^™^ which can be identified with 2-tensor in El^=iT=i (^h,i2^i^ ® 
Uj2,2 G Ui 0U2. 

For i G {1,2,3} denote by f := {p,q} = {l,2,3}\{j}, where 1 < p < q < 3, 
and set JJjc = Ujp^^} := Up (8 Ug. 

A tensor r G Ui (8 U2 <8) U3 induces a linear transformation r(j) : \Jjc \Jj 
as follows. Suppose that Ui,^, . . . ,Umi,e is a basis in U^ for £ = 1,2,3. Then any 
V G Ujc is of the form 

mp,mq 

and the application of r(j) is given by 

nij rnp,mq 

^0')v= J^( th,i2,i3Vip,iq)^ij,j- (2-4) 
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Then rankj(r) is the rank of the operator T{j). Equivalently, let A{j) = [a^.j^] G 
^mpmqxm,j ^ whcrc cach integer I € {mprriq) corresponds to a pair {ip,iq), for ip = 
1, . . . , rup, iq = 1, . . . , mq, and ij G {rrij). (For example we may arrange the pairs 
{ip,iq) in the lexicographical order. Then ip = \:^~\ and iq = i ~ {ip — l)mq.) Set 
ae,ij = th,i2,i3- Then rankj(r) = rank A{j). 

The following proposition is straightforward. 

Proposition 2.1 Let t G Ui U2 (g) U3 be given by (2.2). Fix j G {1,2,3} and 

setf = {p,q]. LetTi^j := [ti,,i,,i,]'lZZli ^ F'"^'X-^i,• = 1, . . . T/ien rank,- (r) 

is the dimension of subspace of nip x nig matrices spanned by Tij, . . . , Tmjj- 

Assume that cach JJj is an inner product space, with the inner product (•, ■)j 
for j = 1,2,3. Let Uij, . . . ,Umjj, j = 1,2,3 be an orthonormal basis in JJj with 
respect to (•, Define an inner product on U, denoted by (•,•), by assuming that 
the basis (2.1) is an orthonormal basis in U. It is straightforward to show that 
the above inner product does not depend on the choice of the orthonormal bases in 
Ui,U2,U3. The so defined inner product in U is called the induced inner product 
and we have identity 

(x (g) y (g) z, u (g) V (g) w) = (x, u)i (y, v)2 (z, w)3. 

On F"^ix™2xm3 ^j^g standard inner product {X, y) is given by Ylu^j=k"^^ Xij^^yij^k, 
where X = [xij^k],y = [yi,j,k\- This inner product is induced by the standard inner 
products on F™i , F'^^ F'"^ . So = {YZ^j=k=i \xi,j,k\'^)^ is the Hilbert- Schmidt 
norm on F"*i^'"'2x™'3_ 

We denote by Gr(p, F") the set of all p-dimensional subspaces of F". It is well 
known that Gr(p, F"^) is a closed set, more precisely an algebraic variety, called the 
Grassmannian of F"- [6] . 

Definition 2.2 Let p G (mi),g G (7712), r G (ma). Denote by Gr(p,F™i) (g 
Gr((3',F™'2) g)Gr(r,F™'^) C Gr(pQ'r, F^i^^^xma-j ^/^^ of all pqr- dimensional sub- 
spaces inF'"i^"^2^™a of the /orm where ^ e Gr(p,F™i),Y G Gr(g, F"^^)^ z G 
Gr(r, F"^3). 

Clearly, Gr(p,F"'i) (g Gr((?,F'"2) (g, Gr(r,F™:^) is a closed subvariety of 
Gr(pgr,F'"i^"*2^"^3). Define by dist(T,S, || ||) := inf^^gs ||T - ;f || the distance of 
r to a set S C F"*i^"*2xm3 ^i^^ respect to the norm || ||. Then the best {p,q,r) 
subspace approximation ofTE F'^"*^" is given by 

min dist(r,X(g)Y(g)Z, II II), (2.5) 

X(giY(giZeGr(p,F"'i )(giGr(g,F'"2 )(giGr(r,F'"3 ) 

and we denote the subspace where the minimum is achieved by X* (g) Y* (g) Z* and 
the minimal tensor by X* G X* (g) Y* (g) Z* , i.e. we have 

dist(r, X* (g) Y* ® zMi II) = ||r- Af*||. (2.6) 

Let ii G {mi), £2 G (771-2), 4 & {m^) and suppose that Ui G Gr(£i,F™i),U2 G 
Gr(^2,F™2),U3 G Gr(4,F"^3). Choose 

Ui,i,...,U^,,l GF'"\ Ui,2,...,U^2,2 GF"^2^ Ui,3,...,U^3,3 GF'"^, 
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such that uij, . . . ,U£.j is an orthonormal basis in Uj for j = 1,2,3. Then for 

Y g ]pmixm2xm3 

ti,j,k = (T,Ui,i (8)Uj-2 <8)Ufc,3), i = l,...,l, j = l,...,m, k = l,...,n. (2.7) 
So T = [tij^k] is the representation of r in the orthonormal basis. Then 

-PUi®U2®U3(7") = C = X] *i,i,feUi,l <^ Uj,2 Ufc,3 (2.8) 

(i,i,fe)e(€i>x(£2>,x(£3> 

is the orthogonal projection of r on the subspace Ui (2> U2 (8) U3. Thus 

dist(r,Ui0U2®U3) = ||T-e|| = ( (2-9) 

( j, J , fe) 6 (mi ) X (m2 > X (ms ) \ (€1 > X (^2 ) X (^3 ) 

is the distance with respect to the Hilbert-Schmidt norm on ]F'"ix™2xm3^ Clearly, 
we have 

Ikf = ll^i®U2®U3WII^ + dist(r,Ui(8)U2«)U3)^ (2.10) 

3 The SVD as best subspace tensor approximation 

In this section we will illustrate that the SVD allows to compute the best subspace 
tensor approximation for 2-tensors. 

Let us view mi x m2 matrices as 2-tensors. Here x (8> y corresponds to the 
matrix xy^. A tensor r G Cg) F'"^ can be viewed as a linear transformation 
r : F"*! F"*2 follows. First observe that a rank 1 tensor x (g) y gives rise 
to the linear transformation (x <S) y)(z) = (z,y)x. Now extend this notion to any 
r G F™! (8> F™2 , which is a sum of rank 1 tensors. 

We claim that the best rank k approximation of r is obtained as the solution to 
the minimization problem 

min dist(T, X (g) Y) = dist(T, X* (g Y*), (3.1) 

XeGr(jk,F"'i),YeGr(ik,F"'2) 

where X* , Y* are the subspaces spanned by the k left and right singular vectors of 
r associated with the largest k singular values. 

Indeed, suppose that the minimum in (3.1) is achieved for some tensor a G 
X* (g) Y*, so rank a < k. Hence the best approximation by a rank k tensor is 
not worse than the minimum of (3.1). On the other hand, any rank k tensor is an 
element of sum X (g) Y for some X G Gr(A;,F™i), Y G Gr{k,¥"^^). So the minimum 
in (3.1) is not bigger than the best rank k approximation. But the best rank k 
approximation to a given 2-tensor is obtained by the SVD [4] . 

We now consider the following approximation problems for 2-tensors, which is 
equivalent to the corresponding matrix problem. 

Lemma 3.1 Let Y C F'"^ ftg a given l\ G (mi) dimensional subspace. For 
i G (mi) and r G F™i F"^^ consider the minimization problem of finding X G 
Gr(i,F"*i) such that 

min dist(r, X (g) Y) = dist(T, X* (g Y). (3.2) 

XeGr(j,F'"i) 
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View T as a linear mapping from F"^i to F'"^ _ jf dim(TY) < i then X* is any 
subspace that contains rY. /f dim(rY) > i then X* is the suhspace spanned by the 
left singular vectors associated with the i largest singular values of t\y (which is a 
linear map r : Y — F"*i 

Proof. Choose the standard orthonormal basis ei,...,eTO,i £ F^'^ and an 
orthonormal basis yi, . . . ,ym2 € F"^^ that Y = span(yi, . . . ,y^). Let Y-*- = 
span(y^+i, . . . , y^J. Then F'"! F"*^ = F'"! Y F"*! ® is an orthogonal 
decomposition of F"*! ® F'"^ . This means that we can write r as 

r = <^ + V, = f|F-i®Y(r), V = ^'f-i®y^(t), ||t||^ = ll-^ll^ + IIV-f- 

Since we require X (g) Y C F"*! (g) Y it follows that the minimization problem (3.2) 
is equivalent to the minimization problem 

min dist(c!.,X®Y) = dist(r,X*(8)Y). (3.3) 

XeGr(i,F"»i) 

Observe next that </>, viewed as a linear transformation (j) : F™i ^ Y is equal to 
r| Y. The classical result for matrices implies that the best rank i approximation of ^ 
is given via the left singular vectors associated to the largest i singular values of (f). □ 

In this section wc have shown that the best subspace tensor approximation for 2- 
tensors is obtained via the singular value decomposition. This immediately suggest 
to use it as a generalization of the SVD for higher tensors. 



4 Best subspace tensor approximations for 3-tensors 

I n this section we study the best subspace tensor approximation for 3-tensors. Let 
r G F"*i^"^2xm3 g^j^j assume that p G {mi),q G (m2),r G (ma) and consider the 
minimization problem 

min dist (r, X ® Y Z) (4.1) 

Xi8iYi8iZeGr(p,F'"i )igiGr(9,F'"2 )(giGr(r,IF'"3 ) 

and suppose that is minimum is achieved for the subspace X* Y* (g) Z* with the 
tensor ^, i.e. 

dist(r,X*0Y*®Z*) = ||r-f ||, C G X* Y* Z*. 

In view of (2.10) this minimization problem is equivalent to the maximization prob- 
lem 

max ||-Px®y®z(t)|P = ||Px*®Y*®z*('r)|p. (4.2) 

X(g)Y(g)ZeGr(p,F"'i )(giGr(g,F"»2 )(g)Gr(r-,F"»3 ) 

To simplify our exposition we state our results for F = M, C, but we give the proofs 
only for F = M. 

To solve the minimization problem, we study the critical points (i.e. the points 
of vanishing gradient) of ||Px(g,Y<g>z(r)|p on Gr(p,F"^i) (g) Gr(g,F"*2) ^ Gr(r,F"*3). 
To do that we need the following lemma which follows from the Courant-Fischer 
theorem, see e.g. [4]. In the following, wc use Fr(i,F™i) to denote the manifold of 
all sets of i orthonormal vectors {xi, . . . , Xj} C F™^ . 
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Lemma 4.1 Let B € jpm-ixmi ^ Hermitian matrix. Let a linear functional 
qb '■ Fr(i,]R™'i) ^ M 6e given by ^(^(xi, . . . , Xj) = Yl]=i^J ^^l- Then the critical 
points of gB are all sets {xi, . . . ,Xi} such that span(xi, . . . ,Xj) contains i linearly 
independent eigenvectors of B. 

Proof. We prove the lemma by induction on i. For i = 1 we have gB{x) = 
x^5x (note that ||x|| = 1). Then by the Courant-Fischer Min-Max characterization, 
see e.g. [4], x 7^ is a critical point if and only if x is an eigenvector of B. 

If xi, . . . , Xj are eigenvectors of B it is straightforward to sec that {xi, . . . , Xj} is 
a critical point oi gB- Indeed, consider a variation x^{t) = x^ + iu£ + iv£ + 0(i^),£ = 
1, . . . , i, where u/ G span(xi, . . . , Xj), G span(xi, . . . , Xj)-*-. Then the contribution 
involving ui, . . . , Uj is quadratic in t. Since v^X| = 0, £ = 1, . . . , i it follows that 
the contribution in vi,...,Vj is also quadratic in t. It remains to show that if 
{xi, . . . , Xj} is a critical point of gB then span(xi, . . . , Xj) is spanned by i eigenvectors 
of B. 

Suppose that the assertion holds for i = A; — 1 and assume that i = k < mi. If 
k = mi then the assertion is clear because the whole space is spanned by eigenvectors 
of B. So let k < m. Note that if {yi, . . . ,yj} € Fr(z,M™'i) and span(yi, . . . ,yj) = 
span(xi, . . . ,Xi) then ^^(xi, . . . ,Xi) = gBiyi, ■ ■ ■ ,yi)- So we may assume w.l.o.g. 
that the matrix 

C=[^jB^t]l,t=i 

is diagonal. Furthermore, we may assume that x^ = e^, s = 1, . . . ,i. The induction 
hypothesis states that for any A; € {i + l, . . . , mi} the symmetric matrix B^, obtained 
by erasing k rows and columns of -B is a direct sum of C and the corresponding other 
block. Hence B = C ® C and the assertion follows. □ 

We immediately have the following corollary. 

Corollary 4.2 Let a € F™"! ® F'"^ qt^c? suppose that i is an integer in the in- 
terval [l,mi]. Then U G Gr(i,F™i) is a critical point of the linear functional 
||-Px(g)F™2 (a)|p : Gr(i,F'"i) [0, 00) if and only if \J is spanned by some i left 
singular vectors of the induced dual operator a : F"*^ F"*!. (Here some singular 
vectors may correspond to the singular value 0.) 

Proof. Represent a hy A e M"^iX"^2 and let B = AA^ . Let X G Gr(z,M"^i) 
and suppose that {xi, . . . ,Xj} G Fr(i,M™) is a basis of X. Then ||Px(g)F'^2(a)|p = 
5b(xi, . . . ,Xi), and the result follows from Lemma 4.1. □ 

We will now construct projections of 3-tensors to 2-tensors, which we can use to 
compute best subspace approximations. 

Let T G F™! (g) F"^2 O F™3 and X G Gr(p,F"i),Y G Gr(g,F™2),Z G Gr(r,F'"3). 
Suppose that ei , . . . , e^^ , fi , . . . , fm2 > gi > • • • > Sms ar^ orthonormal bases in F^^ , F'"^ ^ ]F™-3 
respectively, such that ei, . . . , Bp, fi, . . . , fg, gi, . . . , are bases of X, Y, Z, respec- 
tively. Then we can express r as r = J2^j=k:^^ U,j,k^i ® fj <^ gifc and consider the 
following linear operators. 

1. The first operator r(Y, Z) : F"*i — > Y Z is constructed as follows. View 
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-F]f'"i (giY(giZ ("T") as a tensor in F"*i (?) Y Z, i.e. 

mi ,q,r 
i=j=fc=l 

and then define for x G F"*i the operator via 

mi ,q,r 

t(Y,Z)(x)= ^ ti,j,fe(x,ei)i fj Ogfc, 

i=j;=A;=l 

where as before (•,•)! denotes the inner product in F"^i. 

2. Analogously wc proceed for r(X, Z) : F"^^ _ ^ X (g) Z. We view -Px®f'"2 ®z (t) 
as a tensor in X (g) F™'^ ^ z, i. e., 

p,m2,r 

-PX(g)F'"2(giz(T) = ^ *i,j,fcei (g) g) gfc 
i=j=fe=l 

and then for any y € F"*^ we define the operator via 

p,m2,r 

T(X,Z)(y)= tij,fe(y,fi)2 (ggfc. 

i=j=fe=l 

3. Finally r(X, Y) : F" ^ X (g Y is given as follows. View Px(g)Y(g)F'"3 (t) as a 
tensor in X (g) Y (g) F"'^ i. e., 

P,q,m3 

-Px(8iY(8iF'"3 (r) = ^ iij.feej (g> fj (g> gfe. 

i=j=k=l 

Then for any z G F"*^ , we define the operator via 

p,<J,m3 

r(X,Y)(z)= ^ ti,j,fe(z,gfe)3 ej (g>f;. 

i=j=A;=l 

We have the following theorem. 

Theorem 4.3 Let ^ t e F""^ ® F'^^ ^ ipma^ ^ ^ (mi),g G (7712), r G 
(ms). T/ien U G Gr(p,F"^i), V G Gr(g,F"*2)^ w G Gr(r,F"^3) «s a chM point of 
||i^x®Yc>5z(r)f on Gr(p, F"*i) (g) Gr(g, R"*^) ^ Gi.(A;,F"^3) if ^nd only if the following 
conditions hold 

1. \J is spanned by some p left singular vectors o/r(V, W). 

2. V is spanned by some q left singular vectors o/r(U, W). 

3. W is spanned by some r left singular vectors o/r(U, V). 



8 



Proof. Since the critical points are the zeros of the first derivative, it is enough 
to prove the necessary conditions for the function ||i-'x®V8)w(''")|P- Considering this 
as function on Gr(p, M"'i), Condition 1. then foUows immediately by Corollary 4.2. 
The other conditions follow analogously. □ 

In the following we will describe an iterative procedure to compute the best 
subspace tensor approximation. In order to find good starting values for U = 
Xq, V = Yq, W = Zo we make use of the SVD. As explained in §2 we can unfold 
r as a matrix Ai, say mi x (m2n3), by considering r(l) as defined in (2.4). Then 
we perform the SVD and use as approximation the corresponding p-dimcnsional 
Xq G Gi{p, F™i ) spanned the left singular vectors of Ai associated with the p largest 
singular values. In a similar way we determine Yq G Gr(g, F'"^)^ Zq G Gr(r, F'"^). 

To find the maximum in (4.2) we then apply a relaxation method. 

Algorithm 4.4 Let r G F"^i F"^^ (g)F"*3, p e {mi),q G {1112), r G (ma) and 
staHing values Xq G Gr(p,F'"i), Yq G Gr(g, G F™^), Zq G Gr(r,F"^3) be given. 
Suppose that (Xj,Yj,Zj) have been computed. Then 

1. Xj+i is obtained as the p-dimensional subspace corresponding to left singular 
vectors of T{Yi,Zi) associated with the p largest singular values. 

2. Yj+i is obtained as the q- dimensional subspace corresponding to the left sin- 
gular vectors o/r(Xj+i,Zj) associated with the q largest singular values. 

3. Zj+i is obtained as the r-dimensional subspace corresponding to the left sin- 
gular vectors o/r(Xj+i, Yj+i) associated with the r largest singular values. 

We have the following convergence result. 

Corollary 4.5 The subspaces Xj,Yj,Zj,i = 0,1,... defined in Algorithm 4-4 
converge to subspaces U, V, W which give a critical point of ||-Px(8iY(8iz(t)|P- More- 
over, this critical point is a maximal point, with respect to any one variable, when 
the other variables are fixed. Furthermore the following conditions hold. 

1. U is spanned by the left singular vectors o/r(V,W) associated with the p 
largest values. 

2. V is spanned by the left singular vectors of r(U, W) associated with the q 
largest values. 

3. W is spanned by the left singular vectors o/r(U,V) associated with the r 
largest singular values. 

In this section we have shown that the best subspace tensor approximation for 
3-tensors is a a generalization of the singular value decomposition. It is obvious how 
this procedure can be extended to arbitrary k tensors. 

Unfortunately the described procedure is extremely expensive, since in every 
step a singular value decomposition of a very large full matrix has to be performed. 
In order to reduce the complexity, in the next section we consider a procedure that 
is based on the recently suggested fast SVD [3] . 
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5 Fast low rank 3-tensors approximations 

In this section we generalize the algorithm outlined in [3] to the fast low rank tensor 
approximation, abbreviated as FLRTA, to 3-tensors. Let A = [(111,12,13] € M'l^'zxis 
be a 3-tensor, where the dimensions hjhJs, are large. For each j = 1, 2, 3 we read 
subtensors of A denoted by C,- = \c^^\ , ] € m'i.j^'2,jX«3,j^ assume that Cj 
has the same number of coordinates as A in j-th direction, and a small number 
of coordinates in the other two directions. That is, Ijj = Ij and the other two 
indices Isj, s G {l,2,3}\{j} are of order 0{k), for j = 1,2,3. So Cj corresponds 
to the j-section of the tensor A. The small dimensions of Cj are {hj,j, hj,j) where 
{sj,tj} = {l,2,3}\{j} for j = 1,2,3. Let mj := lsj,jkj,j for j = 1,2,3. 
To determine an approximation, we then look for a 6-tensor 



^ ~ ['^'(?l,(?2,'?3,'?4,g5,lJ6] ^ 



^'2,1X^3,1 X/1,2 Xi3,2 X/1,3 Xi2,3 

and approximate the given tensor ^ by a tensor 

where we contract the 6 indices in V and the corresponding two indices {1, 2, 3}\{j} 
in Cj for j = 1, 2, 3, i.e., our approximation has the entries 

^2,1 h,l h,2 hfl h,3 <?2,3 

^h,i2,i3 = X/ X/ X/ X/ X/ X/ ^9l.92,?3,?4,?6,96Cii,qi,q2%,«2,?4'^«5,?6,i3" ^^■■'■^ 

<7l = l 92 = 1 93 = 1 94 = 1 95 = 1 96 = 1 

This approximation is equivalent to a so-called Tucker approximation [10]. Indeed, 
if we represent each tensor Cj by a matrix Cj G R^^^x'i that has the same number 
of columns as the range of the j-th index of the tensor A and as number of rows 
the product of the ranges of the remaining two small indices of Cj, i.e. Cj = 

['^rij]Tij=i- Then c^^^j^ is equal to the corresponding entry cf_^\2i:V ^^^^^ value 
of r corresponds to the double index {is,it) for {s,t} = {1,2, 3}\{j}. 

Now with U = [■Uji,j2,j3] € M"^ixm2xm3^ ^Yie equivalent Tucker representation of 
B = [611,12,13] is given by the entries 

mi m2 ms 

biui.,is = E E E ^h,h,h<^,n<^,i2<^,i3^ (ii,^2,i3) e ih) X {£2) X (4). (5.2) 
ji=i i2=ii3=i 

This formula is expressed commonly as 

B = U X1C1X2C2X3C3. (5.3) 
We now choose three subsets of the rows, columns and heights of A 

I C (^1), #I = p, J C {£2), #J = q, KC (4), #K = r. (5.4) 



Let 



Ci = A^i>,J,i^ := [ai,j,k] G M^ix9xr^ • ^ ^^^^^^ ^J,k€K, 

C2 = Ai^(^e,),K ■■= [a^,j,k] G M^x^^xr^ . g g {i2),ke K, (5.5) 

Cs = ^7,j,(£3> ■■= Kj,k] G MP'"'''^^ i G I,j eJ,ke (^3), 

5 = ((^1) X J X K) U (I X {£2) xK)uiI xJ X (£3)). 
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We define Uh and Wopt as in [3] . 

= arg min V {ai^j,k -{U XiC\x2 C2 X3 C3). • ^f, (5.6) 

{i,j,k)€{h)x{l2)x{l3) 

Wopt = arg min V (aij,^ - (W Xi Ci X2 C2 X3 C3). ■ j^.)^. (5.7) 

(«J,fe)65 

Instead of computing Uopt we do the following approximations, as suggested in 
[3] for the case q = p,r = . Unfold the tensor A = in the direction 3 to 

obtain the matrix E = [es,k\ £ ]R(^i'^2)x^3^ g^^^ _ a^ . j^ for the corresponding pair 
of indices (i, j) G (^1) x {I2). Then the set of indices (i,j) € I x J corresponds to 
the set of indices L C {£1 ■ £2), where #L = pq. Denote by El^k the submatrix of 
E which has row indices in L and column indices in K. Let e\^ ^ G be the 

Moore-Penrose inverse of El^k- As in [3] we approximate the tensor A by 

For each k & K consider the matrix 

Fk ■■= ■^{h),{t2),k = [ai,j,fe]i,y=i ^ I^^'""^^- 

Next we approximate by Gk := {Fk){i^)^j{Fk)\^j{Fk)i^{i^). As in [3] we try 
several random choices of /, J, K with the cardinalities p, q, r respectively, with the 
best preset conditions numbers for the matrices El^k and {Fk)i^j for k & K. 
Equivalcntly, we have that 

•^(h),J,k'^\,j,k'^l,{i2),k, (5-9) 

is an approximation of ^(£i),(£2),fc- Replacing ^(^i),(^2),fc appearing in (5.8) with the 
expression that appears in (5.9), we obtain the approximation B of the form (5.3). 
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